Persistently Optimal Policies in Stochastic Dynamic Programming with Generalized Discounting
نویسندگان
چکیده
منابع مشابه
Persistently Optimal Policies in Stochastic Dynamic Programming with Generalized Discounting
In this paper we study a Markov decision process with a non-linear discount function. Our approach is in spirit of the von Neumann-Morgenstern concept and is based on the notion of expectation. First, we define a utility on the space of trajectories of the process in the finite and infinite time horizon and then take their expected values. It turns out that the associated optimization problem l...
متن کاملRegular Policies in Stochastic Optimal Control and Abstract Dynamic Programming
Notation Connection with Abstract DPMapping of a stationary policy μ: For any control function μ, with μ(x) ∈ U(x) forall x , and J ∈ E(X ) define the mapping Tμ : E(X ) 7→ E(X ) by(TμJ)(x) = E{g(x , μ(x),w) + αJ(f (x , μ(x),w))}, x ∈ XValue Iteration mapping: For any J ∈ E(X ) define the mapping T : E(X ) 7→ E(X )(TJ)(x) = infu∈U(x)E{...
متن کاملConvergence of Sample Path Optimal Policies for Stochastic Dynamic Programming
We consider the solution of stochastic dynamic programs using sample path estimates. Applying the theory of large deviations, we derive probability error bounds associated with the convergence of the estimated optimal policy to the true optimal policy, for finite horizon problems. These bounds decay at an exponential rate, in contrast with the usual canonical (inverse) square root rate associat...
متن کاملStochastic Dynamic Programming with Markov Chains for Optimal Sustainable Control of the Forest Sector with Continuous Cover Forestry
We present a stochastic dynamic programming approach with Markov chains for optimal control of the forest sector. The forest is managed via continuous cover forestry and the complete system is sustainable. Forest industry production, logistic solutions and harvest levels are optimized based on the sequentially revealed states of the markets. Adaptive full system optimization is necessary for co...
متن کاملUtilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs
Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Mathematics of Operations Research
سال: 2013
ISSN: 0364-765X,1526-5471
DOI: 10.1287/moor.1120.0561